72 research outputs found

    Machine Learning of Generic and User-Focused Summarization

    Full text link
    A key problem in text summarization is finding a salience function which determines what information in the source should be included in the summary. This paper describes the use of machine learning on a training corpus of documents and their abstracts to discover salience functions which describe what combination of features is optimal for a given summarization task. The method addresses both "generic" and user-focused summaries.Comment: In Proceedings of the Fifteenth National Conference on AI (AAAI-98), p. 821-82

    Animation Motion in NarrativeML

    Get PDF
    This paper describes qualitative spatial representations relevant to cartoon motion incorporated into NarrativeML, an annotation scheme intended to capture some of the core aspects of narrative. These representations are motivated by linguistic distinctions drawn from cross-linguistic studies. Motion is modeled in terms of transitions in spatial configurations, using an expressive dynamic logic with the manner and path of motion being derived from a few basic primitives. The manner is elaborated to represent properties of motion that bear on character affect. Such representations can potentially be used to support cartoon narrative summarization and question-answering. The paper discusses annotation challenges, and the use of computer vision to help in annotation. Work is underway on annotating a cartoon corpus in terms of this scheme

    The Creeping Virtuality of Place

    Get PDF
    Places are inherently dynamic. They also mediate between entities and events of significance to us, and space. They reflect a network of associations, involving landmarks deemed salient for various reasons. These are all properties assigned to a place by a speaker, and may or may not correspond to the properties assigned to a place by any other speaker. As a result, places have a subjective quality. These properties of dynamicity and subjectivity present interesting challenges when producing mashups that align different data sources. I propose addressing this by assuming that entities, following Hornsby & Egenhofer (2000), have histories, namely sequences of time intervals when they are predicated to exist. Places are entities with spatial properties that include topological relationships to other places, represented in terms of RCC-8 or the 9-intersection calculus, as well as distance and orientation relations. This spatio-temporal integration can avail of existing annotation schemes for space and time in natural language, but it leaves some open issues related to the representation of subjectivity

    Chronoscopes: A theory of underspecified temporal representations

    Get PDF
    Representation and reasoning about time and events is a fundamental aspect of our cognitive abilities and intrinsic to our construal of the structure of our personal and historical lives and recall of past experiences. This talk describes an abstract device called a Chronoscope, that allows a temporal representation (a set of events and their temporal relations) to be viewed based on temporal abstractions. The temporal representation is augmented with abstract events called episodes that stand for discourse segments. The temporal abstractions allow one to collapse temporal relations, or view the representation at different time granularities (hour, day, month, year, etc.), with corresponding changes in event characterization and temporal relations at those granularities. A temporal representation can also be filtered to specify temporal trajectories of particular participants. Trajectories, in turn, can be intersected at various levels of granularity. Chronoscopes can be used to compare temporal representations (e.g., for aggregation, summarization, or evaluation purposes), as well as help in the visualization of temporal narrative

    Machine Learning of User Profiles: Representational Issues

    Full text link
    As more information becomes available electronically, tools for finding information of interest to users becomes increasingly important. The goal of the research described here is to build a system for generating comprehensible user profiles that accurately capture user interest with minimum user interaction. The research described here focuses on the importance of a suitable generalization hierarchy and representation for learning profiles which are predictively accurate and comprehensible. In our experiments we evaluated both traditional features based on weighted term vectors as well as subject features corresponding to categories which could be drawn from a thesaurus. Our experiments, conducted in the context of a content-based profiling system for on-line newspapers on the World Wide Web (the IDD News Browser), demonstrate the importance of a generalization hierarchy and the promise of combining natural language processing techniques with machine learning (ML) to address an information retrieval (IR) problem.Comment: 6 page

    Learning to match names across languages

    Get PDF
    We report on research on matching names in different scripts across languages. We explore two trainable approaches based on comparing pronunciations. The first, a cross-lingual approach, uses an automatic name-matching program that exploits rules based on phonological comparisons of the two languages carried out by humans. The second, monolingual approach, relies only on automatic comparison of the phonological representations of each pair. Alignments produced by each approach are fed to a machine learning algorithm. Results show that the monolingual approach results in machine-learning based comparison of person-names in English and Chinese at an accuracy of over 97.0 F-measure.

    How to Evaluate your Question Answering System Every Day and Still Get Real Work Done

    Full text link
    In this paper, we report on Qaviar, an experimental automated evaluation system for question answering applications. The goal of our research was to find an automatically calculated measure that correlates well with human judges' assessment of answer correctness in the context of question answering tasks. Qaviar judges the response by computing recall against the stemmed content words in the human-generated answer key. It counts the answer correct if it exceeds agiven recall threshold. We determined that the answer correctness predicted by Qaviar agreed with the human 93% to 95% of the time. 41 question-answering systems were ranked by both Qaviar and human assessors, and these rankings correlated with a Kendall's Tau measure of 0.920, compared to a correlation of 0.956 between human assessors on the same data.Comment: 6 pages, 3 figures, to appear in Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000

    Protein Name Tagging Guidelines: Lessons Learned

    Get PDF
    Interest in information extraction from the biomedical literature is motivated by the need to speed up the creation of structured databases representing the latest scientific knowledge about specific objects, such as proteins and genes. This paper addresses the issue of a lack of standard definition of the problem of protein name tagging. We describe the lessons learned in developing a set of guidelines and present the first set of inter-coder results, viewed as an upper bound on system performance. Problems coders face include: (a) the ambiguity of names that can refer to either genes or proteins; (b) the difficulty of getting the exact extents of long protein names; and (c) the complexity of the guidelines. These problems have been addressed in two ways: (a) defining the tagging targets as protein named entities used in the literature to describe proteins or protein-associated or -related objects, such as domains, pathways, expression or genes, and (b) using two types of tags, protein tags and long-form tags, with the latter being used to optionally extend the boundaries of the protein tag when the name boundary is difficult to determine. Inter-coder consistency across three annotators on protein tags on 300 MEDLINE abstracts is 0.868 F-measure. The guidelines and annotated datasets, along with automatic tools, are available for research use
    corecore